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In this paper we propose and discuss variance reduction tech- 
niques for the estimation of quantiles of the output of a complex 
model with random input parameters. These techniques are based 
on the use of a reduced model, such as a metamodel or a response 
surface. The reduced model can be used as a control variate; or a 
rejection method can be implemented to sample the realizations of 
the input parameters in prescribed relevant strata; or the reduced 
model can be used to determine a good biased distribution of the 
input parameters for the implementation of an importance sampling 
strategy. The different strategies are analyzed and the asymptotic 
variances are computed, which shows the benefit of an adaptive con- 
trolled stratification method. This method is finally applied to a real 
example (computation of the peak cladding temperature during a 
large-break loss of coolant accident in a nuclear reactor). 



1. Introduction. Quantile estimation is of fundamental importance in 
statistics as well as in practical applications [Law and Kelton (1991)]. A 
main challenge in quantile estimation is that the number of observations can 
be limited. This is the case when the observations correspond to the output 
of expensive numerical simulations. Variance reduction techniques specifi- 
cally designed for this problem have been proposed and implemented. These 
techniques usually involve importance sampling [Glynn (1996)], correlation- 
induction [Avramidis and Wilson (1998)] and control variates [Hsu and Nelson 
(1987), Hesterberg and Nelson (1998)]. In this paper we focus our attention 
on variance reduction techniques based on the use of an auxiliary variable. 
For the problem of the quantile estimation of a computer code output, the 
auxiliary variable is the output of a reduced model, which is coarse but 
cheap from the computational time point of view. We will show how to use 
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it to find convenient parameters for tlie stratified and importance sampling 
teclmiques [Rubinstein (1981)]. 

Quantiles form a class of performance measures. Quantile estimation for 
a real- valued random variable (r.v.) Y aims at determining the level y such 
that the likelihood that Y takes a value lower than y is some prescribed value. 
We assume that Y has an absolutely continuous cumulative distribution 
function (cdf) F{y) = ¥(Y < y) and a continuously differentiable probability 
density (pdf) p{y). We look for an estimation of the a-quantile defined 
by F{ya) = a. 

In this paper we assume that Y = f{X) is the real- valued output of a 
computer code / that is CPU time expensive and whose input parameters 
are random and modeled by the random vector X E R"^ with known distribu- 
tion. With the advance of computing technology and numerical methods, the 
design, modeling and analysis of computer code experiments have been the 
subject of intense research during the last two decades [Sacks et al. (1989), 
Fang, Li and Sudjianto (2006)]. The problem of low or high quantile esti- 
mation (smaller than 10% and larger than 90%) can be resolved by classical 
sampling techniques such as the simple Monte Carlo or Latin hypercube 
techniques. These Monte Carlo methods can give imprecise quantile esti- 
mate (with large variance) if they are performed with a limited number of 
code runs, say, of the order of 200 [David (1981)]. An alternative approach 
is, to calculate a tolerance limit rather than a percentile by using Wilk's for- 
mula [Nutt and Wallis (2004)]. It provides with a low number of code runs, 
less than 200, say, a majoring value of the desired percentile with a given 
confidence level (e.g., 95%). But the variance of this tolerance limit is larger 
than that of the empirical estimate, for the same number of code runs. 

Another well known approach for the uncertainty analysis of complicated 
computer models consists in replacing the complex computer code by a re- 
duced model, called metamodel or response surface [Fang, Li and Sudjianto 
(2006)]. However, a low or high quantile estimate from a metamodel tends 
to be substantially different from the full computer model quantile because 
the metamodel is usually constructed by smoothing of the computer model 
output. Recently, some authors have taken advantage of one particular type 
of metamodel: the Gaussian process model [Sacks et al. (1989)], which gives 
not only a predictor (the mean) of a computer experiment but also a local in- 
dicator of prediction accuracy (the variance). In this context, Oakley (2004), 
Vazquez and Piera Martinez (2008) and Ranjan, Bingham and Michailidis 
(2008) have developed sequential procedures to choose design points and to 
construct an accurate Gaussian process metamodel, specially near the re- 
gions of interest, where the quantile lies. Rutherford (2006) proposes to use 
geostatistical conditional simulation techniques to obtain many realizations 
of the Gaussian process, which in turn can give a quantile estimate. How- 
ever, all these techniques are based on the construction of a Gaussian pro- 
cess model which can be difficult, albeit possible [Jones, Schonlau and Welch 
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(1998), Schonlau and Welch (2005), Marrel et al. (2008)], in the 
high-dimensional context (d > 10). Moreover, in industrial practice, a meta- 
model may already be available that comes from a previous study or from a 
simplified physical model. This is the situation we have in mind. We do not 
concentrate our effort on the construction of a more accurate metamodel, 
but on the use of a given reduced model. 

In our work we deal with this situation in which a reduced model is avail- 
able, in the form of a metamodel fr, which is a coarse approximation of 
the computer model /. The quality of the metamodel may not be known; 
the metamodel may be a simplified version of the computer code (a one- 
dimensional version for a three-dimensional problem, a response surface de- 
termined during another study, . . .); its input variables may be a subset of 
the input variables for the computer code /. The full computer model / is 
assumed to be very computationally expensive, while the evaluation of the 
metamodel fr and the generation of the input r.v. X are assumed to be very 
fast (essentially free). Therefore, the focus of this paper is on how to exploit 
the metamodel to obtain better control variates, stratification or importance 
sampling than could be obtained without it. In the real example we address 
in Section 5 (computation of the peak cladding temperature of the nuclear 
fuel during a large-break loss of coolant accident in a nuclear reactor), the 
CPU time for each call of the function / is 20 minutes, while the metamodel 
fr is a linear function and the input r.v. X is a collection of d = 53 inde- 
pendent real-valued r.v. with normal and log-normal distributions. In this 
example, we also study the relevance of using more complex and powerful 
metamodels, such as the popular Gaussian process model. 

The usual practice of quantile estimation is to construct an estimator of 
the cdf of Y first, then to deduce an estimator of the a-quantile of 1". In 
absence of the control variate, the standard method is the following one. 
The estimation is based on a n-sample (Yi, . . . ,Yn), that is to say, a set of 
n independent and identically distributed r.v. with the pdf p{y) of Y. The 
empirical estimator (EE) of the cdf of F is 

1 " 

(1) FEE{y) = -J2^y^<y^ 

1=1 

which leads to the standard estimator of the a-quantile 

(2) >EE(a) =ini{y,FEE{y) > «} = y{[an]): 

where [x] is the integer ceiling of x and Y(fc) is the kth. order statistics. 
Refined versions of this result based on interpolation and smoothing methods 
can be found in the literature [Dielman, Lowry and Pfaffenberger (1994)]. 
The empirical estimator 1ee(o) has a bias and a variance of order 1/n 
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[David (1981)]. The empirical estimator is asymptotically normal, 

(3) v^(yEE(a)-2/„)"-^AA(o,4E), 4E = ^Tr^- 

We remark that the reduced variance a"^^ is usually larger when a larger 
quantile is estimated (the pdf at ya is then very small). 

The outline of the paper is as follows. First we describe quantile estimation 
by control variate in Section 2. Section 3 presents an original controlled 
stratification method. Then, a controlled importance sampling strategy is 
analyzed in Section 4. A real example is addressed in Section 5. 

2. Quantile estimation by Control Variate (CV). In this section we present 
the well-known variance reduction techniques based on the use of Z = fr{X) 
as a control variate. The quantiles Za of Z are assumed to be known, as well 
as any expectation E[g((Z)] of a function of Z. We mean that these quantities 
can be computed analytically, or they can be estimated by standard Monte 
Carlo estimations with an arbitrary precision, since only the reduced model 
fr is involved. 

2.1. Estimation of the distribution function. A Control Variate (CV) es- 
timator of F{y) with the real-valued control variate Z has the form 

(4) Fcy{y) = FEE{y)-C{gn-n9{Z)]), 

where the function g( : R — > M has to be chosen by the user [Nelson (1990)] and 
9n = ^ SiLi fl'(-^i)- The optimal parameter C is the correlation coefficient 
between g{Z) and ly<j/ whose value is unknown in practice. Therefore, the 
estimated parameter C is used instead. It is defined as the slope estimator 
obtained from a least-squares regression of 'i-Yj<y on g{Zi): 

^ ^ E|_=i(ly,<y - FEE{y)){g{Zj) - gn) 



Y.U^g{z,)-gnY 

As shown by Hesterberg (1993), the estimator Fc\i[y) with the estimated 
parameter C can be rewritten as the weighted average 

n 

(5) Fcv{y) = Y.WjlY,<y 

i=i 



with 



^ _ 1 (gn-E[ff(Z)])(g^-ff(Z,)) 
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Note that E"=i Wj = 1. If g{z) = 1,<,^, then E[g{Z)] = a, 5n = Nq/u with 

n a 1 a 

(6) Aro = ^i^^<^^ and Wj = —lz,<z^ + ^^^j;^lz,>z^- 

As shown by Davidson and MacKinnon (1992), the estimator (5) is equiva- 
lent to the maximum likehhood estimator for probabilities. By using stan- 
dard results for the convergence of Monte Carlo estimators [Nelson (1990)], 
one finds 

V^(Fcv(y) - F{y)) AA(0, al^), 

(7) 

al^ = F{y){l-F{y)){l-p% 
where pi is the correlation coefficient between ly<j/ and '\-z<za'- 

ny<y,Z<z^)-aF{y) 



(8) PI 



v/F(y)(l-F(y))V^^ 



This result can be compared to the corresponding central limit theorem in 
absence of control, which claims that the empirical estimator Fee defined 
by (1) is asymptotically normal, 

V^{FEE{y) - F{y)) AA(0, 4e), 4e = F{y){^ " F{y))- 

Comparing with (7) reveals an asymptotic variance reduction of 1 — /of. 

2.2. Quantile estimation. Our goal is now to estimate the a-quantile of 
Y by using the CV estimator of the cdf of Y . We consider the order statistics 
(Yj-i), . . . , i^(n)) with the weights W^rj) defined by (6) sorted according to the 
Y(j). Using the estimator (5) of the cdf of Y, the CV estimator of the a- 
quantile is 

(9) ycv(a) = %), /^ = inf|j,^VF(,)>a|. 

Applying standard results for the variance reduction for Monte Carlo 
methods [David (1981)], one finds that this estimator is asymptotically nor- 
mal with the reduced variance ctq^, 

(10) V^(ycv(«)-y«)™AA(0,a2v), a2v = ^ilz^(i-p2)^ 

where p is the pdf of Y and pi is the correlation coefficient between lY<y„ 
and lz<z-- 



ny<ya,z<za)-a 

PI = 2 
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Comparing (10) with (3) reveals a variance reduction by the factor 1 — pj. 
As expected, the stronger Y and Z are correlated, the larger the variance 
reduction is. It is not easy to build an estimator of the reduced variance 
o"cV) because this requires to estimate the pdf p{ya)- However, it is possible 
to build an estimator of the correlation coefficient pj, which controls the 
variance reduction. This estimator is the empirical correlation coefficient pi 
defined by 



^^^^ VE-=l(ly,<,-^EE(?/))V&l(l^.<--^n(^-))' 
With Gn(^a) = iEr=l 1Z,<.„. 



'y=ycv(a) 



2.3. The optimal CV estimator. In the previous section the control vari- 
ate function is g{z) = lz<2^, which allows both an easy implementation and 
a substantial variance reduction. In general, the variance reduction obtained 
with the CV estimator (4) depends on the correlation coefficient between 
ly<y and the control g{Z). The optimal control, which maximizes the cor- 
relation coefficient, is obtained with the function [Rao (1973)] 

(12) g*{z)=ny <y\Z=z). 

This function is usually unknown, otherwise it could be possible to compute 
analytically the cdf F{y) by taking the expectation of g*{Z), and solve nu- 
merically the equation F{y) = a to get the quantile. However, this result 
gives the principle of refined CV methods using approximations of the op- 
timal control function g* . Continuous approximations have been proposed, 
that are however difficult to implement in practice [Hastie and Tibshirani 
(1990)]. Discrete approximations have been presented, which have been 
shown to be very efficient and easy to implement. We now describe the dis- 
crete method proposed by Hesterberg and Nelson (1998). Let us choose m + 
1 outpoints = ao < ai < • • • < Om = 1, and denote by — oo = < Za^ < 
••• < -Zom = oo the corresponding quantiles of Z. The intervals {zaj_i,Zaj] 
will be used as strata to construct a stepwise constant approximation of the 
optimal control. This construction is based on the straightforward expansion 
of the cdf of Y: 

m 

(13) F{y)=Y,P,{y){aj-aj.i), 

where Pj{y) is the conditional probability 

(14) Pj{y)=F{Y<y\Ze{za,_„Za,]). 
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The quantiles of Z are known, so the estimation of -F(y) is reduced to the es- 
timations of the conditional probabihties. The Poststratified Samphng (PS) 
estimator of F{y) is 

m 

-^ps(y) = ^Pj{y){pLj - Qj_i), 
i=i 

where 



Pjiy) 



En -I 



The PS estimator can be written as a weighted average of 'i-Y <y as weh. It 



can also be interpreted as a CV estimator with (Z) = lz<zaj ) J = li • • ■ > 
as control variates. Its variance is 

(15) Var(Fps(y)) 



1 / 1 \ 



Using Gaussian examples, Hesterberg and Nelson (1998) have shown that 
the optimal variance reduction (the one achieved with g*) can be almost 
achieved with the discrete approximation with two or three strata. Based on 
numerical simulations, the authors recommend to choose the cutpoint ai = a 
for the PS strategy with two strata. They also apply the strategy with three 
strata on some particular examples. In the next section we will show that 
we can go beyond the variance reduction obtained with the optimal control 
g*{Z) or its approximations by using the reduced model in a different way. 

3. Quantile estimation by Controlled Stratification (CS). The use of a 

reduced model to estimate directly the quantiles may be not efficient. Indeed, 
as mentioned in the introduction, the reduced model is usually a metamodel 
or a response surface that has been calibrated to mimic the response of the 
complete model f{X) for typical realizations of X, and not to predict the 
response f{X) for exceptional realizations of X. This is precisely what is 
sought when the purpose is to estimate quantiles. Besides, it is very difficult 
to give an estimate of the error when only the reduced model is used to 
estimate the quantiles. 

In this section we exploit the existence of a reduced model Z = fr{X) in 
a different manner than the CV strategy. The idea of the previous section 
was to use the reduced model as a control variate, or equivalently, post- 
stratification, without modifying the sampling. The idea of this section is to 
use it in order to implement nonproportional stratified sampling in which 
we do modify the sampling by rejection. The rough idea is to generate many 
realizations of X, to evaluate the corresponding reduced responses fr{X), 
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and to accept /reject the realizations depending on the responses fr{X). The 
complete model / will be used only with the accepted realizations. We can 
therefore enforce the numbers of realizations of X such that fr{X) lie in 
prescribed intervals, and increase the numbers of realizations in the more 
important intervals. 

3.1. Estimation of the distribution function. Let us choose m + 1 cut- 
points = Qo <«!<••• < am = 1 and denote by — oo = < Za-^ < • • • < 
Zcx^ = OO the corresponding quantiles of Z. As noted in the previous sec- 
tion, the cdf of Y can be expanded as (13), so the estimation of F{y) is 
reduced to the estimations of the conditional probabilities Pj (y) defined by 
(14). Let us fix a sequence of integers A''i, . . . ,Nm such that J2]=i = 
where n is the total number of simulations using the complete model /. 
For each j, we use the rejection method to sample Nj realizations of the 

input r.v. (Xj-"''*)i=i,...,Arj , such that the reduced output r.v. Z^-"'^ = fr{x\^^) 
lies in the interval {zcij_^,Za.]. For each of these Nj realizations, the out- 
put Y^^^ = f{x\^^) is computed. The conditional probability Pj{y) can be 
estimated by 



1 ' 

■I 1=1 




which gives the CS estimator of F{y) 



m 



(16) 



i^Cs(2/)=E^:/(y)(«J-«^-i)- 



The estimator Fcsiu) is unbiased and its variance is 



(17) Var(Fcs(y)) = E ^ ' [Pjiv) - Pjiv)]- 



If the number m of strata is fixed, if {Pj)j=i,...,m. is a sequence of positive real 
numbers such that J2JLi(3j = 1, and if we choose Nj = [n/3j], where [x] is 
the integer closest to x, then the estimator Fcsiu) is asymptotically normal: 



V^(^cs(y)-i^(y))™AA(o,a2s), 

(18) 



m r \2 



We first try to estimate the complete cdf F{y) for all y G R. 
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When Z is independent of Y (which means that there is no control), then 
we have Pjiy) = F{y) and 



(19) Var(Fcs(?/)) 



m / \2 



■J=l ■' 



X {F{y) - F{yf 



If we use a proportional allocation in the strata (3j = aj — aj-i, then Nj = 
[{aj — aj_i)n] and we find that the variance of the CS estimator is, modulo 
the rounding errors, equal to ^[F{y) — F{y)'^], which is the variance of the 
empirical estimator. 

When the r.v. Z is an increasing function of Y (i.e., to say, Z controls 
completely Y), then we obtain 

Var(Fcs(y)) = ^"^°""^°-^^' [p,o(^) -P3oiyf] < 

where jo is such that y G (yoj^.j, ya^^J- If we choose equiprobable strata aj = 

j/m and proportional sampling Pj = 1/m, then Var(i^cs(y)) ^ l/(4?7in). 
The variance of the CS estimator has therefore been reduced by a factor of 
the order of 1/m. 

Let us now look for the estimation of the tail of cdf F{y), in the region 
where F{y) ~ 1 — 5 with < (5 <C 1. If Z and Y have positive correlation, 
then it is clear that we should allocate more simulation points in the tail of 
the reduced model Z, so as to increase the number of realizations that are 
potentially relevant. 

As an example, we can choose m = 4, ai = 1/2, a2 = 1 — 25, = 1 — 5, 
Nj = n/4: for j = 1,...,4. Note that this particular choice allocates n/2 
points in the tail of the cdf of Z, where zi-2S < 

If Z and Y are independent, then equation (19) shows that this strat- 
egy involves an increase of the variance of the estimator by a factor 2: 
Var(Fcs(y)) — 26/n compared to the empirical estimator Var(FEE(2/)) — 
5/n. 

If Z and Y are so strongly correlated that the probability of the joint event 
Z < zi_2<5 a-iid Y > yi-s is negligible, then equation (19) shows that this 
strategy involves a variance reduction by a factor smaller than 26: Var(Fcs(y)) ^ 
2(5^ /n. This means that the variance reduction can be very substantial in 
the case where the variables Y and Z are correlated. 



3.2. Quantile estimation. Here we consider the problem of the estima- 
tion of the a-quantile of Y, with a close to 1. From the previous results, we 
can propose the estimator of the a-quantile of Y given by 



^cs(a) =inf{?/,-Fcs(?/) >«}, 
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where Fcs{y) is the CS esthnator of the cdf of Y. The esthnator Ycs{a) is 
asymptotically normal, 

If Z and Y are positively correlated, then it is profitable to allocate more 
points in the cdf tail of Z, so as to increase the number of potentially relevant 
realizations. 

In the following we carry out numerical simulations on a toy example in 
which n = 200 and a = 0.95. We apply the CS method with m = 4 strata de- 
scribed here above (ai = 0.5, 02 = 0.9, 03 = 0.95, Nj = n/A for j = 1, . . . ,4). 
We underline that this example is very simple and the reduced model could 
certainly be improved. In particular, a Gaussian process approach would 
provide a very good approximation with a few tens of simulations (see Sec- 
tion 5). The reduced model for this toy example has in fact been chosen so as 
to have approximately the same quality (in terms of correlation coefficients 
p and Pi) as the one we expect in the case of the real example addressed 
in Section 5. Our first objective here is to validate the CS strategy on this 
toy example for which we can check the CS estimations in terms of bias and 
standard deviation. Our second objective is to show that it can give good 
results with the parameters n = 200 and a = 0.95 even in the case in which 
Pi is relatively small, which is the context of the real example addressed in 
Section 5. 



Toy example. ID function. Let us consider the following configuration. 
X is assumed to be a Gaussian r.v. with mean zero and variance one. The 
functions / and are given by 

(20) fr{x)=x^, /(x) = 0.95x2[l + 0.5cos(10x) +0.5cos(20x)]. 

The quantiles of Z = fr{X) are given by Za = + a)/2)]^, where ^ is 

the cdf of the M{0, l)-distribution. The quantiles of Y are not known analyt- 
ically, as can be seen in the plot of the pdf of Y in Figure 1(a), obtained by 
a series of 5 10'' Monte Carlo simulations. By using these Monte Carlo sim- 
ulations, we have evaluated the theoretical 0.95-quantile of Y: yo.95 = 3.66, 
and the correlation coefficient between Y and Z: p = 0.84. The efficiency of 
the CS method is related to the value of the indicator correlation coefficient 
Pi, which can be computed from the simulations and (11): pi = 0.62. We 
compare the CS estimator with the empirical estimator and the CV esti- 
mator of the a-quantile (Figure 1(b)). One can observe that the quantile is 
poorly predicted by the empirical estimator, slightly better predicted by the 
CV estimator, while the CS estimator seems more efficient. 
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Fig. 1. (a.): Pdf ofY ^ f{X) and Z = fr{X) for X Af{0,l) , obtained with 510'^ Monte 
Carlo simulations, (h): Estimation of the a-quantile of the r.v. Y = f{X) from a n-sample, 
with a = 95% and n — 200. The three histograms are obtained from three series of 10* ex- 
periments. The theoretical quantile is ya = 3.66. The mean of the 10 empirical estimations 
is 3.86 and their standard deviation is 0.83. The mean of the 10* CV estimations is 3.74 



and their standard deviation is 0.744. 
their standard deviation is 0.381. 



The mean of the 10 CS estimations is 3.63 and 



3.3. Adaptive controlled stratification (ACS). We first show that there 
exists an optimal choice for the allocation of the simulation points in the 
strata. Let us consider the CS estimation of the cdf of Y as described in 
Section 3.1. Let us fix y € R. The CS estimator (16) of F[y) is asymptotically 
normal and its reduced variance cj^g is given by (18). In fact, if the rounding 
errors are neglected, ctq^/ii is the variance of the CS estimator Fcsiu) ^ov 
any n by (17). 

We note that, if we choose to allocate the simulation points proportionally 
in the strata, that is, we choose f3j = aj — aj-i, then the reduced variance of 
the CS estimator is equivalent to the reduced variance of the PS estimator 
(15). The important point is that this proportional allocation is not efficient, 
as we now show. 

If the number m of strata is fixed, as well as the outpoints (Q;j)j=o,...,m and 
the total number n of simulations, then it is possible to choose the numbers 
of simulations [Pjn] per stratum so as to minimize the variance of the CS esti- 
mator. It is well known that an optimal allocation policy for standard strati- 
fication exists [Fishman (1996), Glasserman, Heidelberger and Shahabuddin 
(1998)]. Here we show that an optimal allocation policy exists for CS, 
in which the strata are determined by the metamodel. By denoting qj = 
(qj — aj_i)^[Pj(y) — Pj{y)], the minimization problem 

{m \ m 

\ , with the constraints Pj > 0, Pj = 1 
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has the unique solution 

/?; 

With this optimal choice the reduced variance of the CS estimator Fcsiu) 
is 

{m 
i=i 

Note that the optimal allocation /3| depends on y in general, which means 
that it is not possible to propose an allocation that is optimal for the esti- 
mation of the whole cdf of Y. However, we can observe the following: 

(1) If the control is weak, then Pj{y) depends weakly on y, and the opti- 
mal allocation is then /3* = 

(2) If the control is strong, then we should allocate more simulations in 
the strata {zaj^i,Zaj] around F{y). 

For instance, let us assume a very strong control, in the sense that Z = 
tp{Y) is an increasing function of Y. Then 

Pj{y) = nY < y\Z e {za,.„Za,]) = nY < y\Y G {i^~\z^^_,),i;-\z^^)]) 

is equal to 1 if Za^ < ipiy) [i-e., aj < F{y)] and to if Zaj__i > ip{y) [i-e., 
aj-i > F{y)]. In these two cases, qj and the optimal P* are zero, and 
all simulations should be allocated in the stratum (z„ . , , z„ . 1 , for which 
Qjo . Of course, the very strong control assumed here is not 
realistic, but this example clearly illustrates the optimal allocation of simu- 
lations in the different strata. 

We now know that there exists an optimal allocation of the n simula- 
tions in the m strata. This allocation depends on the Pj{y), which are the 
quantities that we want to estimate. We can therefore propose an adaptive 
procedure: 

(1) First apply the CS method with n = ri' simulations, 7 G (0, 1), and an 
a priori choice of the f3j . We then obtain a first estimation of the conditional 
probabilities Pj{y)- 

1 [^^■'^1 

(2) Estimate the optimal allocation (3* by 

g {a, - a,-i)[PM - PMr^' 

' j:r=i{o^i-m-i)my)-Pi{yn/'' 



1/2 



e: 



1=1 ' 



1/2 ' 



l,...,m. 
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(3) Carry out the n — n last simulations by allocating the simulations in 
each stratum in order to achieve the estimated optimal number [n[3j\ for all 
3- 

(4) Estimate Pj{y) and F{y) by 

The ACS estimator is unbiased conditionally on Pj > for the j's such 
that Pj > 0. The probability of the complementary event is of the order of 

exp(— cn) which can be usually neglected. The ACS estimator -FACs(y) is 
asymptotically normal, 

V^{FAcsiy) - F{y)) AA(0, ales), 

(22) 

f m 




The expression of the reduced variance c^cs same as (21), which is 

the one of the CS estimator with the optimal allocation The difference 
is that the variance o'leg/n is only reached asymptotically as n ^ 00 in the 
case of the ACS estimator, while the variance is o"ocs/^ ™ case 

of the CS estimator with the optimal allocation. Note that the convergence 
of the ACS estimator is ensured whatever the choice of the positive a priori 
numbers Pj. In practice, a good a priori choice will speed up the convergence. 

We now present an asymptotic analysis of the variance reduction for the 
CV, PS, CS and ACS methods in the case of m = 2 strata. 

In the PS method, or in the CS method if we choose the proportional 
allocation Pj = Oj — aj-i, the reduced variance in (18) is 

(23) als = F{y)il-F{y))[l-p% 

where pi is the correlation coefficient between ly<j, and lz<zc defined by 
(8). (Tp3 is also the reduced variance of the CV estimator (7). 
Hesterberg and Nelson (1998) have already noticed that the PS and CV es- 
timators are equivalent. In the ACS method, the expression of the reduced 
variance <Tacs ™ (22) is 

alcs = F{y){l-F{y))K^ 



K = a 



l + Pi 



(l_«)(l_F(y)) 



1/2' 



1/2 



I- PI 



aF{y) 

\ 1/21 1/2 



(l-a)F(y) \ 
a{l-F{y))) 



1/2' 



1/2 
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I -PI 



Xl-a)F{y)) _ ■ 

If we assume that the correlation coefficient pj is small, then we get the 
following expansion with respect to pr. 



(24) 



„2 



F{y){l-F{y)) 



+ 0{p- 



8Fiy){l-Fiy)) 

These results show that the CV, PS, CS and ACS methods involve a variance 
reduction of the same order when the goal is to estimate the cdf around the 
median F{y) ~ 1/2. However, when the goal is to estimate the cdf tail F{y) ~ 
or 1, the ACS method gives a larger variance reduction. Of course, the CS 
method with a nearly optimal allocation policy gives the same performance 
as the ACS method, but the implementation of this method requires some a 
priori information on the correlation between Y and Z to guess the correct 
allocation, while the ACS method finds it. 

The expressions that we have just derived also give indications for the 
choice of the cutpoint a. Indeed, the variance reduction is all the larger 
as the correlation coefficient pj is larger. For instance, if we assume that 
Z = '4>{Y) is an increasing function of Y, which models a very strong control, 
then one finds 

1/2 



PI 





-F{y)) 


(1- 


a)F{y) 


(1- 


a)F{y) 


a{l 


-F{y)) 



1/2 



if < 

if Za > tp{y). 



As a function of a, this function is maximal when a = F{y). This shows that, 
if the goal is to estimate the cdf of Y around some y, then it is interesting 
to choose a = F{y). 

We can also revisit the asymptotic analysis of the variance reduction 
for the CV, PS, CS and ACS methods in the case of a large number m 
of strata. In the PS method and in the CS method with the proportional 
allocation f3j = Uj — Uj-i, the reduced variance is (18). If the conditional 
probability ¥{Y < y\Z) has a continuous density g* with respect to the 
Lebesgue measure, then (18) is a Riemann sum that has the following limit 
as m ^ oo: 

als = <y\Z)- ¥{Y < y\Zf] = F{y) - E[P(y < y\Zf]. 

This is actually the reduced variance of the optimal CV estimator, when 
the optimal control function g* defined by (12) is used. In the ACS method, 
the expression of the reduced variance o'\q^ is (22), which has the following 
limit as m — > oo: 

dies = E[(P(y < y\Z) - P(y < y\Zff'^]\ 
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The Cauchy-Schwarz inequality clearly shows that the variance reduction 
is larger for the ACS method than for the optimal CV method using the 
optimal control function g* . 

Whatever the value of m > 2, we can also use the Cauchy-Schwarz in- 
equality to check that the reduced variance for the PS method (or for the 
CS method with the proportional allocation) 

m 

is always larger than the reduced variance (22) for the ACS method. 

Finally, it is relevant to estimate the additional computational cost of 
controlled stratification compared to empirical estimation. It is of the order 
of {Nr — n)Tx + NrTf^, where Nr is the number of evaluations of fr and X , 
Tx is the computational time for the generation of a realization of the input 
r.v. X, and Tj^ is the computational time for the call of the function fr- For 
the CS method with the allocation f3j, we have 



m 



E[iV^] = n > '-^ < 



Pi ^ n 



where the last inequality holds uniformly in (3. Besides, Nr has fluctuations 
of the order of ^/n. The same estimate holds true for the ACS method. In 
the real example we have in mind (in which the computational time for the 
function / is 20 minutes), this additional cost is negligible. 

3.4. Quantile estimation by adaptive controlled stratification. In this sec- 
tion we use the ACS strategy to estimate the a-quantile of Y, with a close 
to 1. We propose the following procedure: 

(1) First apply the CS method with n = ri' simulations, 7 G (0, 1), and 
with an a priori allocation policy [3j , so that a first estimate of the conditional 
probabilities Pj{y) can be obtained: 

The corresponding estimators of the cdf and the a-quantile of Y are 

m 

F{y) = J2{aj - a,-i)Pjiy), % = inf{y,F(y) > a]. 
j=i 

(2) Estimate the optimal allocation /3* for the estimated a-quantile Ya 

by 

(25) - a,-i)[PAyo.) - PAyc.?v'' 



minj(aj — aj- 
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(3) Carry out the n — n final simulations by allocating the simulations in 
each stratum in order to achieve the estimated optimal number [/3jn]. 

(4) Estimate Pj{y) and F{y) by 

Pjiy) = 7j—, V^)<„' ^Acs(y) = J2 ^j(y)((^j - 

The ACS estimator of the a-quantile Ua is 

^ACs(a) =inf{y,FACs(y) > a}- 
The estimator YxQ^[a) is asymptotically normal, 

^/^(^Acs(a) -ya)™AA(0,c7ics), 



2 

"^ACS 



To summarize, we have found the following expressions of the reduced 
variance for the different methods: 



for the empirical estimator. 



2 a(l-a) 
<7ee ~ 



• for the PS estimator or for the CV estimator with the proportional allo- 
cation j3j = Uj — Oij-i [see (10)], 



9 a(l ^, , ..^ 



• for the ACS estimator with two strata separated by a 

2 a{l-a) p] , ^, 3. 

Here pi is the correlation coefficient between lyKya ^^'^ ^z<za given by (11). 
This shows that the CV, PS, CS and ACS methods give variance reductions 
of the same order when the goal is to estimate quantiles close to the median 
a ^ 1/2. However, when the goal is to estimate large quantiles q or 1, 
the ACS method is much more efficient. 



3.5. Simulations. We now present a series of numerical simulations that 
illustrate the theoretical results presented in this paper. These examples 
are simple and they are used to validate the ACS method when n = 200, 
a = 0.95 and the reduced model has poor quality (i.e., pi is small). We will 
address in Section 5 a real example in which these conditions hold. 
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Toy example. ID function. Let us revisit the toy example based on the 
ID function (20) and look for the a-quantile of Y with a = 95%. We compare 
the performances of the empirical estimator (2), the CV estimator (9) with 
the control variate Z, and the ACS estimator. The ACS method is first 
implemented with two strata [0, ai] and (ai, 1] with the cutpoint ai = a. We 
use h = 2n/W simulations for the estimation (25) of the optimal allocation, 
with n/10 simulations in each stratum. The results extracted from a series 
of 10000 simulations are summarized in Table 1. Note that the ACS method 
allocates 85% of the simulations in the stratum [0,0.95], and 15% in the 
stratum (0.95,1]. This shows that we deal with a configuration where the 
number of simulations in the stratum (0.95, 1] has to be increased compared 
to the expected value in the case where there is no control, where only 5% 
of the simulations should be allocated in the stratum (0.95, 1]. 

We have also implemented the ACS method with 3 strata [0, ai], (ai, 02], 
(q2, 1], with cutpoints ai = 0.85 and Q2 = 0.95. We use 3n/10 simulations for 
the evaluation (25) of the optimal allocation, with n/10 simulations in each 
of the three strata. The results extracted from a series of 10000 simulations 
are presented in Table 1. The variance reduction for the ACS method with 
three strata is very important. The standard deviation of the ACS estimator 
is 3 times smaller compared to the empirical estimator of the CV estimator. 
Note that the optimal allocation should attribute a fraction /?i < 0.1 to the 
first stratum, but the number 0.1 cannot be lowered due to the fact that 
n/10 simulations in the first stratum [0,ai] were already used in the first 
step of the estimation. 

The previous simulations were carried out with the sample size n = 2000. 
In such a case, the ACS method is robust, in the sense that the choice of the 
number n of simulations devoted to the estimation of the optimal allocation 
is not critical. When n is smaller, such as n = 200, then the choice of n be- 
comes critical: if h is too small, then the estimation of the optimal allocation 





Table 1 






Estimation of the 


a-quantile with a = 


0.95, n = 2000, 


= 3.66 


Method 


Quantity 


Mean 


Standard deviation 


Empirical estimation 




3.66 


0.33 


CV estimation 




3.65 


0.29 


ACS method with 2 strata 




0.86 


0.02 


[0,0.95], (0.95,1] 


Vacs (a) 


3.65 


0.28 


ACS method with 3 strata 


Pi 


0.10 


0.02 


[0,0.85], (0.85,0.95], (0.95,1] 


P2 


0.58 


0.02 






0.32 


0.01 




V\cs(a) 


3.65 


0.12 
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Table 2 






Estimation of the 


a-quantile with a 


= 0.95, 71 = 200, 


Ua = 3.66 


Method 


Quantity 


Mean 


Standard deviation 


Empirical estimation 




3.88 


0.83 


CV estimation 


ycv(Q) 


3.73 


0.74 


ACS method with 3 strata 


Pi 


0.14 


0.16 


[0,0.85], (0.85,0.95], (0.95,1] 




0.55 


0.11 






0.31 


0.06 




Vacs (a) 


3.62 


0.38 



may fail during the first step of the ACS method; if n is too large, then n — h 
may be too small and it may be impossible to allocate the estimated optimal 
number of simulations to each stratum during the second step of the ACS 
method. We have applied the ACS method with n = 200 to the example 
1, and it turns out that the ACS method with h = n/10 is still efficient. 
However, we cannot claim that this will be the case for all applications. 
The results obtained from a series of 10000 simulations are summarized in 
Table 2. The standard deviations of the estimations of the /3j's are much 
larger than in the case n = 2000, but the quality of the estimation of the 
optimal allocation is just good enough to allow for a significant variance 
reduction for the quantile estimation. The standard deviation of the ACS 
estimator is here 2 times smaller compared to the empirical estimator or 
the CV estimator. If n = 100, then the ACS method fails (in the sense that 
some simulations give /?i = 0), and the CS method with an allocation of the 
simulation points prescribed by the user should be chosen. 

4. Quantile estimation with Controlled Importance Sampling (CIS). We 

consider the same problem as in the previous sections. In this section we 
show that the reduced model can be used to help design a biased distri- 
bution of the input r.v. X in order to implement an efficient importance 
sampling (IS) strategy. The standard IS method consists in simulating the 
n-sample of the r.v. X according to a biased distribution, and to multiply 
the output by a likelihood ratio to recover an unbiased estimator. In the 
case in which the biased distribution favors the occurrence of the event of 
importance, the variance of the estimator can be drastically reduced com- 
pared to the standard empirical IVIonte Carlo estimator. Adaptive versions 
of the IS procedure have been proposed and studied, whose principle is to 
estimate first a "good" biased distribution that is to say, a distribution that 
properly favors the occurrence of the event of importance, before using this 
biased distribution as in the standard IS estimation. We will propose an 
estimator of the cdf of Y first, then an estimator of the a-quantile of y, by 
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a controlled importance sampling (CIS) procedure. In this procedure, the 
reduced model fr and the associated r.v. Z = fr{X) are used to determine 
the biased distribution, while the complete model / is used to perform the 
estimation. 

4.1. Estimation of the distribution function. An IS estimator of the cdf 

of y = f{x) is 

(26) My) = lthix.)<y'^. X.^g, 

where gori is the original pdf of X and q is the biased pdf chosen by the user. 
In practice, it can be useful to use a variant in which the denominator n in 

(26) is replaced by J2i'=iQoniXi)/q{Xi), in order to enforce -Fis(y) — > 1 as 
y — > oo. Other alternatives can be found in Hesterberg (1995). The estimator 
Fis{y) converges almost surely to F{y) when n oo. In fact, the estimator 
Fis{y) is unbiased as soon bs the support of the origiiicil pdf ^ori 

is included 

in the support of the biased pdf q. The variance of (26) is given by 

(27) Var[Fis(y)] = ^ (/ ^/(-)^^^^^°"(^)' dx - ^(y)^) . 

The IS can involve a dramatic variance reduction compared to the standard 
empirical estimator if the biased pdf q is properly chosen. The variance of 
Fisiu) is minimal [Rubinstein (1981)] when the biased pdf is taken to be 
equal to the optimal pdf defined by 

(28) q*{x)= 

J ^f{x')<yQori{x')dx' 

This result cannot be directly used to perform the simulations because the 
normalizing constant of the optimal pdf is the quantity that is sought. How- 
ever, this remark gives the basis of an adaptive procedure where the optimal 
density is estimated. 

The parametric approach for the adaptive IS approach is the following 
one. We first choose a family of pdfs Q = {q^y'T^ G T} and we then try to 
estimate the parameter 7. In this section we assume that the family of 
pdfs Q is parameterized by the first two moments: 7 = (A, C) : A G M"' is the 
expectation and C € Md{^) is the covariance matrix of X when the pdf of 
X is q-y. 

The strategy to determine the best biased density in the family Q is 
based on the following remark. The theoretical optimal density is q* and 
it is given by (28). The expectation and covariance matrix of the random 
vector X under q* are 

. . ^ fxlf(^\<Ci,qoriix)dx ^ f XX^lf(r)<i,qoriix) dx ^ 

29 A*= , /W<i/'^°m 7 C* = —, /W<;/'j°"V yyt^ 

J ^f{x)<yQori{x)dx J 1 f^^)<yqon{x) dx 
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The idea is to choose in the family Q the pdf q-^ which has expectation 
A* and covariance matrix C*, that is to say, we choose the pdf q^* with 
^* = (A*,C*). 

The problem is now reduced to the estimation of A* and C* . If we assume 
that the reduced model is so cheap that we can use as many simulations 
based on as desired, then we can estimate A* and C* by 



(30) 



^ ^ Ya=\ ^^lZi<^;go^i(^^)/go(^^) 
Etllz,<j;gori(X,)/(/o(X,) ' 

Ef=llz.<,gon(X,)/go(^.) 



Xi ~ qo, 



where qo is an a priori pdf chosen by the user. If no a priori information is 
available, then the choice qo = gori is natural. The estimators A and C are 
well defined on IJiLil-^i ^ u}- For completeness, we can set A = and C = Id 
on the complementary event, whose probability is of the form exp(— cn). As 
n ^ oo , the estimators A and C converge almost surely to 



A,, — — p- ■: — r — ; aUCl 0„ — p- ■: — r — A, 

J ^ Mx)<yQori{x) dx j lj^(^)<ygori(x)d2; 



■.t 

r 1 



which are close to A* and C* if fr is a good enough reduced model. The 
variance of the CIS estimator of F{y) using the biased pdf determined by 
the metamodel is (27) with q = qj*, 7* = (A*,C*). Fcis is asymptotically 
normal, 

V^iFcisiy) - Fiy)) "^AA(0,4ig), 

•J Q-yf \^) 

Note that the selected biased pdf depends on y. Indeed, it is not possible to 
propose a biased pdf that is efficient for all values of y. This is not surprising, 
since the principle of the IS method is to favor the realizations that probe 
a specific region of the state space that is important for the target function 
whose expectation is sought (here, x i— > l^(j.)<j^). 

4.2. Quantile estimation. In this subsection we look for the a-quantile 
of Y. The CIS strategy consists in determining a biased pdf that is efficient 
for the estimation of the expectation 

(31) E[lj^^x)<zJ = J 'i-Mx)<zMix) dx = a, 

where fr is the reduced model and Za is the a-quantile of Z, which is assumed 
to be known. The determination of a biased pdf q that minimizes the IS 
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estimator of the quantity (31) will give a pdf that probes the important 
regions for the estimation of the a-quantile of Z, and also of the a-quantile 
of Y if the reduced model is correlated to the complete computer model. 

As in the previous subsection, we will look for the biased pdf in a family 
Q of pdfs parameterized by the first two moments 7 = (A,C). By using 
only the reduced model, we estimate the parameter 7 with the estimator (30) 
with y = Za- Next we apply the IS estimator (26) of the cdf of Y by using the 
complete model and the biased density q^, 7 = (A, C). Finally, the estimator 
of the a-quantile is Ycis{a) = inf{y, -Fis(y) > a}. It is asymptotically normal, 

VniYcisia) - Va) ™ AA(0,(j^is), 



In the case where the reduced model is not so cheap, adaptive IS strategies 
can be used with the reduced model to estimate the parameters of the bi- 
ased density [Oh and Berger (1992)]: roughly speaking, at generation k, the 
parameter 7^ is estimated by using a standard IS strategy using the biased 
pdf 7fc_i obtained during the computations of the previous generation. 

4.3. Simulations. Let us consider the case where X = (Xi,X2) is a ran- 
dom vector with independent Gaussian entries with zero mean and variance 
one. The functions / and are given by 

(32) fr{x) = \xi\xi + X2, 

/(x) = 0.95|xi|2;i[l + 0.5cos(10a;i) -h0.5cos(20xi)] 

(33) 

+ 0.7x2[l + 0.4cos(2;2) +0.3cos(14x2)]. 

The pdf of y = f{X) and Z = fr{X) are plotted in Figure 2(a). By using 
Monte Carlo simulations, we have evaluated the correlation coefficient be- 
tween Y and Z: p = 0.90. From (11), we have also evaluated the indicator 
correlation coefficient: pj = 0.64. The empirical estimator and the CIS es- 
timator of the a-quantile of Y are compared in Figure 2(b). The family Q 
consists of the set of two-dimensional Gaussian pdfs parameterized by their 
means and covariance matrices. The comparison is also made with the CV 
estimator and the CS estimator and it appears that the variance of the CIS 
estimator is significantly smaller than the one of the other estimators. 

CIS is the best strategy in this example. However, CIS (in the present 
version) is successful only when one unique important region exists in the 
state space. For instance, in the case of the ID function treated in the pre- 
vious sections (where there are two equally important regions far away from 
each other due to the parity of the function /), the CIS strategy fails in the 
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Fig. 2. (-A.): PdfofY = f{Xi,X2) and Z = fr{Xi,X2) for Xi, X2 Af{0,l) ■ (b): Es- 
timation of the a-quantile of Y from a n-sample, with a — 0.95 and n = 200. The four 
histograms are obtained from four series of 5000 experiments. The mean of the empirical 
estimations is 2.83 and their standard deviation is 0.52. The mean of the CV estimations is 
2.74 and their standard deviation is 0.38. The mean of the CS estimations is 2.71 and their 
standard deviation is 0.25. The mean of the CIS estimations is 2.77 and their standard 
deviation is 0.21. The theoretical quantile (obtained from a series 0/ 5 10^ simulations) is 
^ 2.75. 



sense that the algorithm to determine the biased pdf does not converge. The 
use of mixed pdf models should be considered to overcome this limitation 
and will be the subject of further research. 

5. Application to a nuclear safety problem. In this section we apply the 
controlled stratification and controlled importance sampling methodologies 
on a complex computer model used for nuclear reactor safety. It simulates a 
hypothetic thermal-hydraulic scenario: a large-break loss of coolant accident 
for which the quantity of interest is the peak cladding temperature. This sce- 
nario is part of the Benchmark for Uncertainty Analysis in Best-Estimate 
Modeling for Design, Operation and Safety Analysis of Light Water Reac- 
tors [Petruzzi et al. (2004)] proposed by the Nuclear Energy Agency of the 
Organisation for Economic Co-operation and Development (OCDE/NEA). 
It has been implemented on the computer code Cathare of the Commissariat 
a I'Energie Atomique (CEA). In this exercise the 0.95-quantile of the peak 
cladding temperature has to be estimated with less than 250 computations 
of the computer model. The CPU time is twenty minutes for each simu- 
lation. The complexity of the computer model lies in the high-dimensional 
input space: 53 random input parameters (physical laws essentially, but also 
initial conditions, material properties and geometrical modeling) are consid- 
ered, with normal and log-normal distributions. This number is rather large 
for the metamodel construction problem. 
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Screening and linear regression strategy. To simplify the problem, we ap- 
ply first a screening technique, based on a supersaturated design [Lin (1993)] 
with 30 numerical experiments. This leads to the determination of the five 
most influential input parameters {UO2 conductivity -'^ig, film boiling heat 
transfer coefficient X44, axial peaking factor Xg, critical heat flux X42 and 
UO2 specific heat X2q)- Then a stepwise regression procedure has been ap- 
plied on the 30 experiments to obtain five additional input parameters and 
a linear regression procedure allows us to obtain a coarse linear metamodel 
of degree one: 

fr{X) = 660.3 - 61.79X2 + 6.141^6 + 589.9Xg + 80.82Xii - 404.5Xi9 

+ 264.2^20 - 27.O6X35 + 6.I6IX37 - 255.7X42 - 31.99X44. 

Note that the present strategy for the metamodel construction is relatively 
basic and not devoted to maximize p/. Other strategies based on pe- 
nalization techniques such as Lasso (least absolute shrinkage and selection 
operator) could be considered to fit the regression model [Tibshirani (1996), 
Hastie, Tibshirani and Friedman (2001)]. 

Controlled stratification. A first test with controlled stratification with 
200 simulations was performed, which gave the following quantile estima- 
tion: 928° C with a bootstrap-estimated standard deviation of 7°C, while 
the quantile estimation from the metamodel is 932° C. A second test with 
controlled stratification with 200 simulations was performed, which gave the 
following quantile estimation: 929° C with a bootstrap-estimated standard 
deviation of 10° C. 

Controlled importance sampling. A biased distribution for the 10 impor- 
tant parameters of the metamodel has been obtained as follows. The original 
distributions of these independent parameters are normal or log-normal. We 
have considered a parametric family of biased pdfs with the same forms as 
the original ones, and we have selected their means and variances by (30) 
with y = Za and go equals to the original pdf. A first test with controlled 
importance sampling with 200 simulations was performed, which gave the 
following quantile estimation: 929° C with a bootstrap-estimated standard 
deviation of 10° C, while the quantile estimation from the metamodel is 
932° C. A second test with controlled importance sampling with 200 simu- 
lations was performed, which gave the following quantile estimation: 924° C 
with a bootstrap-estimated standard deviation of 8°C. 

Empirical estimation. A test sample of 1000 additional computations 
(with input parameters chosen randomly) was then carried out. We have 
first used this random sample to check the quality of the metamodel. We 
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have found that p = 0.66, B? = 0.09 and pi = 0.54, which shows that the 
metamodel has poor quahty (as it could have been expected). We also used 
the random sample to get empirical estimations of the quantile. For the full 
sample n = 1000 the empirical quantile estimation is 928° C with a standard 
deviation of 6°C. For n = 200 the empirical quantile estimation is 926° C with 
a standard deviation of 12°C. It thus appears that the controlled stratifica- 
tion estimator and the controlled importance sampling estimator performed 
with 230 simulations (30 for the screening and 200 for the controlled estima- 
tion) have better performances than the empirical estimator with 200 sim- 
ulations, and have performances close to the empirical estimator with 1000 
simulations. This shows that controlled stratification and controlled impor- 
tance sampling can be used to substantially reduce the variance of quantile 
estimation, in the case in which a small number of simulations is allowed 
but a reduced cheap model is available, even if this reduced model has poor 
quality. 

Gaussian process (Gp) strategies. In order to compare our approach 
(crude initial screening step before controlled stratification step) to a strat- 
egy including a more involved metamodel construction step, we propose 
to show some results obtained with a Gp approach [Sacks et al. (1989), 
Schonlau and Welch (2005)]. 

• First, we perform a numerical experimental design of 200 Cathare code 
simulations. We choose a maximin Latin hypercube sampling design, well 
adapted to the Gp model construction [Fang, Li and Sudjianto (2006)]. 
This difficult fit (due to the high dimensionality and small database) can 
be realized thanks to the algorithm of Marrel et al. (2008), specifically 
devoted to this situation. The obtained Gp model contains a linear re- 
gression part (including 15 input variables) and a generalized exponential 
covariance part (including 7 input variables). We use the test sample of 
1000 additional computations to check the predictor quality of this new 
metamodel: p = 0.84, = 0.70 and pi = 0.73. As expected, the quality of 
this Gp model is much higher than the crude one. However, a brute-force 
Monte Carlo estimation (with 10^ computations) of the 0.95-quantile us- 
ing the predictor of this metamodel gives 917°C, which underestimates 
the "true" quantile (928° C). A better strategy, which could be applied in 
a future work, would be to choose sequentially the specific design points 
to improve the Gp fit around the quantile, as in Oakley (2004). 

• As a second comparison, we propose to perform the controlled stratifica- 
tion process with the predictor of a Gp model. We fit a Gp model with a 
smaller number of runs than the previous one, keeping other runs for the 
controlled stratification step. We choose a maximin Latin hypercube sam- 
pling design with 100 design points. Below this sampling size, Gp fitting 
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Table 3 

Estimation of the 0.95-quantile for the nuclear safety problem. Gp(lOO) [resp., Gp(200)] 
is the Gp model estimated from 100 (resp., 200) design points. Mm(30) is the metamodel 
fr estimated from 30 numerical experiments 



Method 




Quantile 
estimation 


Standard deviation 
estimated by bootstrap 


EE from code (n = 1000) 




928 


6 


EE from code (n = 200) 




926 


12 


EE from Mm(30) {n = 10*^) 




932 


~0 


EE from Gp(lOO) (n = 10*^) 




912 


~0 


EE from Gp(200) {n = lO'^) 




917 


~0 


CS with Mm(30) test 1 (n = 


200) 


928 


7 


CS with Mm(30) test 2 (n = 


200) 


929 


10 


CS with Gp(lOO) (n = 200) 




917 


9 


CIS test 1 (n = 200) 




929 


10 


CIS test 2 (n = 200) 




924 


8 



becomes unfeasible because of the large dimensionality of our problem 
(53 input variables). The obtained Gp model contains a linear regression 
part (including 7 input variables) and a generalized exponential covari- 
ance part (including 6 input variables). The quality of this Gp model is 
measured via the test sample and gives p = 0.82, i?2 = 0.66 and pi = 0.37. 
The Gp predictivity is rather good but, compared to the previous one, the 
Pi value shows a strong deterioration around the 0.95-quantile (the Gp 
model 0.95-quantile is 912° C). Using the predictor of this Gp model, the 
controlled stratification with 200 simulations gives the following quantile 
estimation: 917°C with a standard deviation of 9°C. This relatively poor 
and biased result confirms the importance of pi in the controlled stratifica- 
tion process: quantile estimation with a coarse metamodel (linear model 
of degree one with = 0.09), but adequate near the quantile region, 
gives better results than quantile estimation with a refined metamodel 
(Gp model with = 0.66), but inadequate near the quantile region. 

Table 3 summarizes all the results we have shown in this section. Other 
experiments, that will be shown in a future paper, have been made to com- 
pare different choices about the strata (number and locations). 

6. Conclusion. In this paper we have proposed and discussed variance 
reduction techniques for estimating the a-quantile of a real-valued r.v. Y in 
the case in which: 

• the r.v. Y = f(X) is the output of a CPU time expensive computer code 
with random input X, 
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• the auxiliary r.v. Z = fr{X) can be used at essentially free cost, where fr 
is a metamodel that is a coarse approximation of /. 

Our goal was to exploit the metamodel to obtain better control variates, 
stratification or importance sampling than could be obtained without it. 

First, we have presented already known variance reduction techniques 
based on the use of Z as a control variate (CV). The CV methods allow a 
variance reduction of the quantile estimator by associating to each of the 
n simulations Yi = f{Xi) weights that depend on Zi = fr{Xi). In the CV 
methods, n realizations (-'^j)i=i,...,n are generated and the corresponding n 
outputs f{Xi) and fr{Xi) are computed. 

Second, we have developed an original controlled stratification (CS) method, 
that consists in accepting/rejecting the realizations of the input X based on 
the values of fr{X). A large number of realizations of the input X and a large 
number of evaluations of fr are used, compared to the CV methods, but the 
number of evaluations of the complete model / is fixed. In the adaptive con- 
trolled stratification (ACS) method, the realizations of the random input Xi 
are sampled in strata determined by the reduced model fr, and the number of 
simulations allocated to each stratum is optimized dynamically. The variance 
reduction can be very substantial. By a theoretical analysis of the asymptotic 
variance of the estimator and by numerical simulations, we have found that, 
if n is large enough, the ACS method is the most efficient one. Note that the a 
priori choice of the parameters for the CS and ACS strategies (choice of n, m 
and Pj) plays no role in the asymptotic regime n oo. However, for n = 200, 
for instance, it plays a primary role. In this paper a toy example with a meta- 
model that has the same quality (in terms of correlation coefficients) as the 
one we have in the real example has been used to validate the parameters of 
the CS strategy. For the time being, we have the feeling that it is the only rea- 
sonable strategy when n is not large enough to apply the asymptotic results. 

Third, a controlled importance sampling (CIS) strategy has been ana- 
lyzed, where the biased pdf for the CIS estimator is estimated by intensive 
simulations using only the reduced model. The variance reduction can be 
significant. However, an important condition in the present version is that 
only one important region exists in the state space. The use of mixed pdf 
models should be considered to overcome this limitation. 

The methods presented in this paper suppose the availability of a re- 
duced model or a metamodel. If it is not available, then the construction of 
a metamodel using linear regression strategies or Gaussian process strate- 
gies or L^-penalization strategies is necessary. However, it seems to be suf- 
ficient to have a crude approximation of the computer model. In industrial 
practice, it is often the case due to the nonlinear effects, the high dimen- 
sionality of the inputs and the limited numbers of computer experiments 
[Fang, Li and Sudjianto (2006), Volkova, looss and Van Dorpe (2008)]. We 
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can note also that a great advantage of these methods is that it is very easy 
to carry out the simulations on a parallel computer, with as many nodes as 
calls of the complex code /. One possible investigative way to improve our 
quantile estimation strategies for the applications would be to optimize the 
number of runs devoted to the metamodel construction and the number of 
runs devoted to the quantile estimation. Furthermore, the computer runs of 
the first step of the ACS method can also serve to update the metamodel; 
then this refined metamodel can be used in the second step. A further im- 
provement would be to update the metamodel fr as one obtains more values 
of / (at least occasionally) during the second step. However, this strategy 
goes against the parallelization of the method and one should be cautious 
and conservative in order to avoid bias, but it is certainly an interesting 
direction of research. 

The different tests performed on our industrial application have shown 
that the metamodel quality has to be sufficient near the quantile region. 
The quality criterion pj has been identified as a good measure of the po- 
tential performance of the controlled stratification process. Another quan- 
tile estimation technique, the sequential construction of a Gaussian process 
model [Oakley (2004), Ranjan, Bingham and Michailidis (2008)], is devoted 
to optimizing the metamodel construction near the quantile region. As a 
perspective of our work, we will try to apply this technique to our high- 
dimensional application. 
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